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Goddard Earth Science DISC 



NASA GES DISC offers atmospheric related observation and 
model data and applied services. 

Data in missions: 

> TRMM (PR, TMI, VIRS ), 

> Terra (MODIS, ASTER), 

> Aqua (AIRS, MODIS, AMSU-A, HSB ), 

> Aura (MLS, HIRDLS, OMI,TES ), 

> CloudSat, 

> CALIPSO, etc. 

Services and Tools: 

> Mirador, 

> Giovanni, 

> OPeNDAP, 

> GrADS, 

> OGCWMS, 

> FTP, etc. 

NASA/GMU CSISS 

Page 2 


02/23/2012 


UCAR SEA Software Engineering Conference 2012 



NASA 
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Water Vapor from AIRS MODIS vs SeaWiFS Chlorophyll Ozone Hole from OMI 


Courtesy of Suhung Shen, NASA GES DISC 
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Goddard Earth Science DISC - Visualization 
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Goddard Earth Science DISC 



http://disc.gsfc.nasa.gov 
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> The migrating procedure 
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> Summary 
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Background -1 


❖ Cloud Computing has been implemented and used by quite a few commercial 
companies (e.g. Amazon EC2 [SaaS, 2006], Google App Engine [PaaS, 2008], 
Microsoft Azure [PaaS, 2008], etc.). 

❖ NASA Launched Nebula in 2008 to provide Infrastructure as a Service (laaS). 

a) Make NASA realize significant cost savings 
through efficient resource utilization, reduced 
energy consumption, and reduced labor costs. 

b) Provide an easier way for NASA scientists and 
researchers to efficiently explore and share 
large and complex data sets. 

c) Allow customers to provision, manage, and 
decommission computing capabilities on an as- 
needed bases. 

NASA Nebula: http://nebula.nasa.gov/ 
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Background -2 



❖ GES DISC has been evaluating feasibility and suitability of migrating GES 
DISC'S applications to the Nebula platform by porting following projects. 

a) Using Nebula Cloud to run scientific data processing infrastructure 

S4PM is an open source data processing infrastructure. Based on S4PM, 
scientific data processing algorithms can be run to efficiently process large 
volumes of satellite data, http://sourceforge.net/proiects/s4pm/ 

b) Using Nebula Cloud to run scientific data processing workflow 

The Atmospheric Infrared Sounder (AIRS) focuses on supporting climate 
research and improving weather forecasting. Based on S4PM, the AIRS Level 
1 & Level 2 algorithms workflow, consisting of many of sub-algorithms 
(executables), processes large volumes of AIRS Level 0 data to produce 
Level 1 data as intermediate results, and finally outputs Level 2 data 
products. 
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Background -3 



c) Porting a Web-based scientific data processing application to Nebula 
Cloud 

Giovanni is a Web-based application which offer online visualization and 
analysis of vast amounts of Earth science data. The Giovanni MAPSS (Multi- 
sensor Aerosol Products Sampling System) portal focuses on visualizing 
aerosol relationships among ground-based data and satellite data. 

❖ The experiences, lessons learned, and tutorials will expedite our future 
efforts to utilize Nebula/Cloud computing technologies to process Earth 
Science data. 
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System Architecture 
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The Migrating Procedures 
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Performance Estimation -1 



-- Hardware Information 



Hardware 


CPU (GHz) 


RAM (GB) 


Cache Size (MB) 


Storage 

CPU 

Microarchitecture 


Core (65nm) /Penryn (45nm) 
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Performance Estimation -2 


S4PM/GUI for PREPQC, AIRS LI & L2 Processing 
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Performance Comparison -1 


-Two-day AIRS L2 Processing at Nebula box and Local box 


Calibrated and 
geolocated radiance in 
physical units, e.g. 
brightness temperature 
in Kelvin (K). 


Input Data (LI): 


Two days (2010.123-124) 

Local Server 

Nebula 1 

Nebula 2 

Input Volume (GB) 

33.1 

33.1 

33.1 

Output Volume (GB) 

12.16 

12.2 

12.2 

Elapsed Time (hours) 

103.05 

133.60 

134.13 

CPU Time (hours) 

102.90 

35.67 

35.80 

System Time (minutes) 

22.47 

11.27 

11.27 



Input Volume Output Elapsed Time CPU Time System Time 
(GB) Volume (GB) (hours) (hours) (minutes) 


Output Data (L2): 

Retrieved physical 
■s4 P t local variables, e.g. 

■ Nebula 1 temperature, humidity 
and ozone profiles, total 
Nebula2 precipitable water, cloud 
top height. 
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Performance Comparison -2 


-Stable and consistent processing at Nebula box and Local box 


AIRS L2 processing at s4pt and Nebula box 


Calibrated and 
geolocated radiance in 


brightness temperature 
in Kelvin (K). 

Output Data (L2): 

Retrieved physical 
variables, e.g. 
temperature, humidity 
and ozone profiles, total 
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precipitable water, cloud 
top height. 


physical units, e.g. 


Input Data (LI): 


One day (2010.123) 

Local Server 

Nebula 1 

Nebula 2 

Input Volume LI data (GB) 

15.3 

15.3 


Output Volume L2 data (GB) 

6.06 

6.11 


Elapsed Time (hours) 

52.47 

17.76 


CPU Time (hours) 

52.34 

17.76 


System Time (minutes) 

10.5 

4.34 



Two days (2010.123-124) 

Local Server 

Nebula 1 

Nebula 2 

Input Volume LI data (GB) 

33.1 

33.1 

33.1 

Output Volume L2 data (GB) 

12.16 

12.2 

12.2 

Elapsed Time (hours) 

103.05 

133.60 

134.13 

CPU Time (hours) 

102.90 

35.67 

35.80 

System Time (minutes) 

22.47 

11.27 

11.27 


Three days (2010.123-125) 

Local Server 

Nebula 1 

Nebula 2 

Input Volume LI data (GB) 

48.8 

48.8 

48.8 

Output Volume L2 data (GB) 

18.3 

18.3 

18.3 

Elapsed Time (hours) 

154.39 

207.84 

207.83 

CPU Time (hours) 

154.14 

55.31 

55.32 

System Time (minutes) 

32.87 

17.23 

17.30 
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Performance Comparison -3 


-- Two-day AIRS LI & L2 Processing at Nebula box and Local box 


Two days (2010.123-124) 

Local Server 

Nebula 1 

Nebula 2 

Input Volume (GB) 

29.11 

29.11 

29.11 

Output Volume L2 data (GB) 

12.14 

11.61 

11.64 

Output Volume all (GB) 

77.47 

74.37 

74.35 

Elapsed Time (hours) 

121. 70h 

157. OOh 

43.11h 

CPU Time (hours) 

120. 98h 

42.80h 

41.52h 

System Time (minutes) 

70.02m 

34.43m 

29.04m 



Input Output Output Elapsed CPU Time System 
Volume Volume L2 Volume all Time (hours) Time 
(GB) data (GB) (GB) (hours) (minutes) 


■ S4pt local 

■ Nebula 1 

■ Nebula 2 


Input Data (LO): 

Raw data from AIRS, 
AMSU-A1, AMSU-A2 
instruments, and data 
about the spacecraft. 

Output Data (L2): 

Retrieved physical 
variables, e.g. 
temperature, 
humidity and ozone 
profiles, total 
precipitable water, 
cloud top height. 
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Hardware Performance Analysis 




CPU (GHz) 8 cores *3.16 16 cores * 2.8 

Microarchitecture 65nm NetBurst 45nm Nehalem /32nm Westmere 



NetBurst Microarchitecture 

Nehalem/Westmere Microarchitecture 

Cache L3 

N/A 

2 MB/core 

FSB 

Dual Independent 800MHz 

QPI=6.4GT/s (Quick Path Interconnections) 

Memory 

DDR-2 400 ECC SDRAM 
(double channel) 

DDR-3 (triple channel) 


Netburst (65nm) — > Core (65nm) /Penryn (45nm) --> Nehalem (45nm)/Westmere (32nm) 


Core = 2.5 x NetBurst 
Penryn = 1.8 x Core 

Nehalem/Westmere=fl. 2-2.0) x Penryn. 


Nehalem/Westmere=f5.4-9.0jx NetBurst 
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Performance Analysis -1 


Two-day AIRS L2 Processing at Nebula box and Local box 




Local Box 
On Off-peak time 

Nebula Box 4 
On Off-peak time 

Nebula Box 4 
On Peak time 

Data Date 

2010.123-124 (2010.05.02-03) 

2010.123-124 (2010.05.02-03) 


Input Data 

-33.1GB 

-33.1GB 

-33.1GB 

Output Data 

-12.16 GB 

-12.2GB 

12.2~ GB 

Spent Time 




— acquire data 

17m 14s (21:36:33 21:53:47) 

6m 33s (14:01:25 - 14:07:58) 

6m 34s (07:58:32 -08:05:06) 

-register data 




-select data 

16m 5s (21:36:36 -21:52:41) 

4m 26s (14:01:31 -14:06:57) 

5m 36s (07:58:37 -08:04:13) 

-find data 

7h 10m 58s (21:36:37 - 4-28 04:47:35 

7h 5m 12s (14:01:34 - 21:06:46) 

7h 4m 54s (07:58:40 - 15:03:34) 

-prepare run 

7h 10m 59s (21:36:38 -04-28 
04:47:37) 

7h 5m 18s (14:01:37 - 21:06:55) 

7h 4m 56s (07:58:43 - 15:03:39) 

-allocate disk 

) 



— run algorithm 

52h 11m 34s (04-27 21:36:40 - 04-30 
01:48:14) 

lOh 34m 23s (14:01:40 - 04-28 
00:36:03) 

lOh 34m 15s (07:58:46 - 18:32:01) 

-register local data 

52h 10m 40s (21:36:41 - 04-30 
01:48:21) 

lOh 34m 27s (14:01:44 - 04-28 
00:36:11) 

lOh 33m 16s (07:58:49 - 18:32:05) 

-export 

52h 10m 40s (21:36:42 - 04-30 
01:48:22) 

lOh 34m 23s (14:01:46 - 04-28 
00:36:09) 

lOh 33m 17s (07:58:52 - 18:32:09) 

-track data 

52h 10m 47s (21:36:44 - 04-30 
01:48:31) 

lOh 34m 25s (14:01:50 - 04-28 
00:36:15) 

lOh 33m 15s (07:58:55 - 18:32:10) 

— sweep data 

52h 10m 40s (21:36:46 - 04-30 
01:48:25) 

lOh 34m Is (14:02:12 -04-28 
00:36:13) 

lOh 33m 10s (07:59:05 - 18:32:15) 
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Performance Analysis -2 


Two-day AIRS L2 Processing at Nebula box and Local box 



NASA/GMU CSISS 

Page 20 


IM^SA 

' J; 


02/23/2012 


UCAR SEA Software Engineering Conference 2012 



r\ 

NASA 



Performance comparison: AWS, Nebula, and local 


V 


PREPQC 2011-111 (one-day) 


One day (2011.111) 

Amazon WS 

Local Linux box 

Nebula 

Hardware Information 

tl. micro: 613MB RAM 
Up to 2 * 1.2GHz Nehalem- 
based processor 

16GB RAM, 4 * 3.16GHz 
NetBurst-based dual-core 
processor (8 cores) 

8GB RAM, 2 * 2.8GHz 
Nehalem-based quad- 
core processor (8 cores) 

Input Volume (MB) 

184.67 

184.67 

184.67 

Output Volume (MB) 

14.89 

14.95 

15.01 

Elapsed Time 
(seconds) 

4980.25 

5745.15 

191.05 

88.83 

CPU Time (seconds) 

2253.23 

2395.79 

184.21 

85.51 

System Time (seconds) 

82.29 

99.42 

7.58 

3.16 


NASA/GMU CSISS 

Page 21 


02/23/2012 


UCAR SEA Software Engineering Conference 2012 



r\ 

NASA 


performance comparison: AWS, Nebula, and local 
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Amazon WS - 2 * 1.2 
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(MB) Volume (MB) (seconds) (seconds) (seconds) 
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Cost Estimation and Comparison -1 


-- Nebula charge policies 

❖ CPU charges 

-- $0.12 per CPU-hr. 

-- $0.48 per hour for an ml. large instance, which uses 4 CPUs. 

-- The charge applies whenever an instance is running, whether or not it is 
processing. 

-- Cloud applications should be designed to terminate non-processing instances 
wherever possible. 


❖ Storage charges 

-- $0.15 per GB-month apply to Volume storage and to Object Store storage. 
-- No charge for internal storage which comes within an instance (100GB) . 

-- Nebula does not charge for the storage used for your images themselves. 
-- Nebula does not charge for inputs and outputs, puts and gets, or network 
bandwidth usage. 
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Output Verification -1 


AIRX2SUP (AIRS L2 Products) 



HDFView 


File Window Tools Help 

StiSIi 


File.URL C:lcygwin\home'achen2 output _bp'airsl12_2day_oiitput_nebula AIRX2SUP. A201 0 1 23.0529.005.20 1 1 1 58 163 1 3 1 .hdf 


j rain_rate_50km 
j rain_rate_1 5 km 
J Qual_Precip_Est 
J IR_Precip_Est 
J I R_P re c i p_E st_E rr 
j Qual_Clim_lnd 
JTropo_CCI 
J Tropo_CCI_Est_Err 
1 Strato_CCI 
j Strato_CC l_E st_E rr 
J MWSutfClass 
j SurfClass 
J FracLandPlusIce 
j sfcTbMWStd 
1 EmisMWStd 


— [ 4 ImageView - C:\cygwin'bomeachen2toutput_bp'airsl12_2dafy_oirtput_nebula'AIFtX2SUP.A2010123.0529.005.201 1 158163131.hdf - jL2_Support_atmospheric8 


Image 

y 


Gl 

Gl 


Image till! 1 1 Gj 



4 ImageView - C:'cygwin'homelachen2'nutput_bp\airsl12_2day_output_s4pUAIRX2SUP.A2010123.0529.005.201 1 153003601.hdf - 
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Output Verification -2 


AIRX2RET (AIRS L2 Products) 



HDFView 


File Window Tools Help 


& 


FileiURL C:\cyuwinViome\achen2\oiitput _bp'airsl12_2day_oirtpnt_s4p1iAIFW2RET.fl2010124.0759.005.2011153163705.hdf 


i&y num_H 2 u_Furiu 

© H20_verticality 
© Qual_03 
© tot03Std 
© tot03StdErr 
© 03VMRStd 
© 03VMRStdErr 
© num_03_Func 
© 03_verticality 
© Qual_CO 
© CO_total_column 
© num_CO_Func 
H CO_trapezoid_layers 
© CO_eff_press 
© CO_VMR_eff 
© CO VMR eff err 


4 ImageView - C:rcygwintfiomeiachen2routpiit_bprairsl12_2day_output_nelnila'AIRX2RET.A2010124.0759.005.201 1 159010921.hdf - JL2_Standard_atmospheric&su 



4 ImageView C:\cygwin'iiome\achen2\outputJ)p\airsl12_2day_oiitput_s4pt'AIRX2RET.A2010124.0759.005.2011153163705.hdf 
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Advantages -1 



❖ User Friendly interface, access to and manage Nebula resources 
- dashboard: simple and convenient web interface 

-- Euca2ools: fast and powerful command line tools 

❖ Better Performance, compared with local box (details in appendix C) 

❖ Lower cost, only pay for used time and resources (details in appendix C) 

❖ Scalability, on-demand provisioning of resources in near real time and 
without users involvement for peak loads. 

❖ Cloning, simple bundling process to save a modified/improved image. This is 
an excellent feature to maintain, back up, and mirror the systems; hence, 
increasing reliability. 
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Advantages -2 



❖ VPN-based high security (1024 bit private and public key and X509 Cert.), 
easy login using private keys. 

❖ Knowledge base: 

-- Detailed how-to instructions for using Nebula via Dashboard and 
Euca2ools. 

-- Fairly comprehensive FAQ, covering most common questions. 

-- Helpful tutorial video for getting started. 

❖ Nebula Forum, good venue for additional materials, user encountered bugs, 
solutions, and discussion. 

❖ Nebula team support, responsive and eager to help; prompt response to 
general questions and resolving commonly encountered problems. 
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Challenges -1 



Stability 


-- Instances are not stable, operational access maybe lost and instances have to be 
rebooted. Before rebooting an instance, all attached volumes have to be detached. 

-- Network (FTP/wget) between Nebula and local machines is slow and not stable. 
Complications may arise from users attempting to ssh into Nebula instances during 
data transfers via FTP/wget (e.g. login failure, frequent FTP timeout, and throughput 
stalls). 

♦♦♦ Under Developed 

-- Object Store not yet available 

-- Lack of tools for managing and monitoring running instances (e.g. Elastic Load 
Balancing, CloudWatch, Auto Scaling, etc.). 

❖ Images, Volumes & Bundles 

-- Bare-bone images lacking trivial software packages (e.g. gcc, xll). 

-- When volume is attached, the specified location maybe not necessarily correspond 
to the entered location (e.g. /dev/vdh may end up as /dev/vdg). 

-- Any defects in the image you start with will be bundled up with your instance into 
your resulting image. (Defects in CentOS images result in bundling issues). 
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Challenges -2 



♦♦♦ Gaps in Knowledge Base 

-- Lack of information on Nebula provided images. 

-- No troubleshooting tools. 

-- Details on hardware and basic software used by Nebula? 

♦♦♦ Communication Concerns 

-- Telecon: Nebula used to have periodic telecon for end users to discuss problems 
needs, defects. These would be beneficial if they would return 
-- Technical support: Faster and more efficient technical support is needed 
-- Forum: Turn around for technical questions is long. Some posts are not responded to 

♦♦♦ Size Limitation 

-- Instances: Maximum of 5 instances per project 

-- Volumes: 100GB volume storage per project (* except ions can be requested directly) 

-- CPUs: 16 cores. 

♦♦♦ Commercial Software 

-- Uncertainty about 3rd-party commercial software installation on Nebula (e.g. 
licenses issues using instances with 3rd-party software in other projects, etc.). 
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Summary 



> Three applications were successfully migrated to Nebula, 
including S4PM, AIRS L1/L2 algorithms, and Giovanni MAPSS. 

> Nebula has some advantages compared with local machines 
(e.g. performance, cost, scalability, bundling, etc.) 

> Nebula still faces some challenges (e.g. stability, object storage, 
networking, etc.). 

> Migrating applications to Nebula is feasible but time consuming. 

> Lessons learned from our Nebula experience will benefit future 
Cloud Computing efforts at GES DISC. 
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Thank You for your attention ! 


Any Questions ? 
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